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Extreme Dependence Models 


Boris Beranger *| Simone A. Padoan ^ 


Abstract 


Extreme values of real phenomena are events that occur with low frequency, but can have a large 


impact on real life. These are, in many practical problems, high-dimensional by nature (e.g. Tawn 


1990; Coles and Tawn, 1991). To study these events is of fundamental importance. For this pur¬ 


pose, probabilistic models and statistical methods are in high demand. There are several approaches 


to modelling multivariate extremes as described in Falk et al. (2011), linked to some extent. We 


describe an approach for deriving multivariate extreme value models and we illustrate the main fea¬ 
tures of some flexible extremal dependence models. We compare them by showing their utility with 
a real data application, in particular analyzing the extremal dependence among several pollutants 
recorded in the city of Leeds, UK. 


1 Introduction 


Statistical analyses of extreme events are of crucial importance for risk assessment in many areas 
such as the financial market, telecommunications, industry, environment and health. For exam¬ 
ple governments and insurance companies need to statistically quantify the frequency of natural 
disasters in order to plan risk management and take preventive actions. 


Several examples of univariate analysis are available, for instance in Coles (2001). Two main 
approaches are used in applications, the block-maximum and the peak over a threshold. These are 
based on the generalized extreme value (GEV) distribution and the generalized Pareto distribution 
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(GPD), which are milestones of the extreme value theory, see e.g. Coles (2001, Ch. 3-4) and the 
references therein. 

Many practical problems in finance, the environment, etc. are high-dimensional by nature, 
for example when analyzing the air quality in an area, the amount of pollution depends on the 
levels of different pollutants and the interaction between them. Today the extreme value theory 
provides a sufficiently mature framework for applications in the multivariate case. Indeed a large 
number of theoretical results and statistical methods and models are available, see for instance the 


monographs Resnick (2007), de Haan and Ferreira (2006), Falk et al. (2011), Beirlant et al. (2006), 


Coles (2001) and Kotz and Nadarajah (2000). In this article we review some basic theoretical results 


on the extreme values of multivariate variables (multivariate extremes for brevity). With the block- 
maximum approach we explain what type of dependence structures can be described. We discuss 
the main features of some families of parametric extremal dependence models. By means of real data 
analysis we show the utility of these extremal dependence models when assessing the dependence 
of multivariate extremes. Their utility is also illustrated when estimating the probabilities that 
multivariate extreme events occur. 

The analysis of real phenomena such as heavy rainfall, heat waves and so on is a challenging 
task. The first difficulty is the complexity of the data, i.e. observations are collected over space and 


time. In this case, theory deals with extremes of temporal- or spatial-processes (e.g. de Haan and 


Ferreira, 2006, Ch. 9). Examples of such statistical analysis are Davison et al. (2012), [Davison and 


Cholamrezaee (2012), for a simple review see Padoan (2013a). This theory is closely linked to that 


of multivariate extremes presented here. The second difficulty is that the dependence of multivariate 


extremes is not always well captured by the models illustrated here. Ledford and Tawn (1996, 1997) 
have shown that in some applications a more suitable dependence structure is described by the so 
called asymptotic independence. This framework has been recently extended to continuous processes 


(e.g. De Haan and Zhou 2011 Wadsworth and Tawn, 2012; Padoan 2013c). These motivations 
make the multivariate extreme value theory a very active research field at present. 

The paper is organized as follows. In Section 1.2 a definition of multivariate extremes is provided 
and the main characteristics are presented. In Section 1.3 some of the most popular extremal 
dependence models are described. In Section 1.4 some estimation methods are discussed and in 
Section 1.5 the analysis of the extremes of multiple pollutants is performed. 
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2 Multivariate Extremes 


Applying the block-maximum approach to every component of a multivariate random vector gives 
rise to a definition of multivariate extremes. Specifically, for d G N, let I = {1,..., d} be an index 
set and X = {Xi,..., Xj) be an M'^-valued random vector with joint (probability) distribution 
function F and marginal distribution functions Fj = F{oo,... ,Xj,... ,oo), j G I. Suppose that 
are n independent and identically distributed (i.i.d.) copies of X. The sample vector 
of componentwise maxima (sample maxima for brevity) is Mn = • • ■, where = 

max(Xij,.. .,Xn,j). 

Typically, in applications the distribution F is unknown and so the distribution of the sample 
maxima is also unknown. A possible solution is to study the asymptotic distribution of as 
n ^ oo and to use it as an approximation for a large but finite sample size, resulting in an 
approximate distribution for multivariate extremes. At a first glance, this notion of multivariate 
extremes may seem too simple to provide a useful approach for applications. However, a number 
of theoretical results justify its practical use. For example, with this definition of multivariate 
extremes, the dependence that arises is linked to the dependence that all the components of X are 
simultaneously large. Thus, by estimating these dependence structures we are also able to estimate 
the probabilities that multiple exceedances occur. 


2.1 Multivariate extreme value distributions 


The asymptotic distribution of Mn is derived with a similar approach to the univariate case. 
Assume there are sequences of normalizing constants = (oni,..., and) > 0, with 0 = (0,..., 0), 
and bn = (&ni, • ■ •, bnd) £ such that 

-—^ ® ) = F^{anX -|- bn) —> G{x), n —)■ oo, (1) 

O^n J 



for all the continuity points s of a non-degenerate distribution G. The class of the limiting distri¬ 
butions in 0 is called multivariate extreme value distributions (MEVDs) ( |Resnick[ |2007[ p. 263). 
A distribution function F that satisfies the convergence result Q is said to be in the (maximum) 


domain of attraction of G (de Haan and Ferreira, 2006, pp. 226-229). An attractive property of 
MEVDs is the max-stability. A distribution G on is max-stable if for every n G N, there exists 
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sequences > 0 and bn £ such that 


G{an x + bn) = G^/"'(£c), 


( 2 ) 


(Resnick, 2007, Proposition 5.9). As a consequence, G is such that is a distribution for every 
a > 0. A class of distributions that satisfies such a property is named max-infinitely divisible (max- 
id). More precisely, a distribution G on is max-id, if for any n G N there exists a distribution 


Fn such that G = Fn (Resnick, 2007, p. 252). This means that G can always be defined through 
the distribution of the sample maxima of n i.i.d. random vectors. 

In order to characterize the class of MEVDs we need to specify: a) the form of the marginal 
distributions, b) the form of the dependence structure. 

a) To illustrate the first feature is fairly straightforward. If F converges, then so too does the 


marginal distributions Fj for all j G I. Choosing ajn and bjn for all j £ I as in de Haan and Ferreira 


( |2006 Corollary 1.2.4), implies that each marginal distribution of G is a generalized extreme value 
(GEV), i.e. 

^ ' 'Xj- 


G(oo, ... ,Xj, ..., oo) = exp 


“ 1 1 + ii 


(Ji 


-i/«i 


, j 


where (x)+ = max(0,x), —oo < yijiij < oo, (Tj > 0 (de Haan and Ferreira 


2006 


pp. 208-211). 


Because the marginal distributions are continuous then G is also continuous. 

b) The explanation of the dependence form is more elaborate, although it is not complicated. 
The explanation is based on three steps: 1) G is transformed so that its marginal distributions are 
equal, 2) a Poisson point process (PPP) is used to represent the standardised distribution, 3) the 
dependence form is made explicit by means of a change of coordinates. Here are the steps. 

1) Let Uj{a) = FF(\ — 1/a), with a > 1, be the left-continuous inverse of Fj, for all j G L The 
sequences and bnj in ([^ are such that for all yj >0, 

lim UAnyV-K ^ Viyf-i) ^ 

n^oc an F 


and therefore 


lim F^{Ui{nyi ),..., Ud{nyd)} 


= G 



6 


+ /^i) 


- 1) 



Go{y), 


(3) 
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for all continuity points y > 0 of Gq (see de Haan and Ferreira, 2006, Theorems 1.1.6, 6.1.1). Go is 


a MEVD with identical unit Frechet marginal distributions. 

Now, for all y > 0 such that 0 < Go{y) < 1, by taking the logarithm on the right and left 
side of and using a first order Taylor expansion of log F{?7i(nyi),..., Ud{nyd)}, as n —>■ oo, it 
follows that 

lim n[l - F{Ui{nyi),.. .,Ud{nyd)}] = -logGo(y) = F(y). (4) 

The function V, named exponent (dependence) function, represents the dependence structure of 
multiple extremes (extremal dependence for brevity). According to Q the derivation of V depends 
on the functional form of F. In most of the practical problems the latter is unknown. A possible 
solution is obtained exploiting the max-id property of Gq, which says that every max-id distribution 


permits a PPP representation, see Resnick (2007, pp. 257-262) and Falk et al. (2011 pp. 141-142). 
2) Let Nn(-) be a PPP defined by 


iV4^):=£n|pj(^), n{p,}(-4) 


i=l 


1, Fi e A, 
0, Pi i A, 


where .A C A with A := (0, oo) x M^, 


P^ = 


i (— bni 

-,a+ei ^ 

n I V a„i 


^ _j_ ^ ^ bnd 

\ 0,nd 


for every n G N and Xi, i = 1,2,... are i.i.d random vectors with distribution F. The intensity 
measure is Q x pn where C is the Lebesgue measure and for every n G N and all critical regions 
defined by By := M(^\[0,y] with y > 0, 

Vn{By) = n[l - F{Ui{nyi),Ud{nyd)}], 

is a finite measure. If the limit in ([^ holds, then converges weakly to as n —>■ oo, i.e. a PPP 
with intensity measure C x y where 

y{By) = y{{v G M+ : ui > yor.. .oru^ > yd)} = V{y), y > 0, 


is a fine measure, named exponent measure (see jde Haan and Ferreira 2006, Theorems 6.1.5, 6.1.11). 
Observe that y must concentrate on M = M(|_\{0} in order to be uniquely determined. Also, y must 
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satisfy r]{oc) = 0, see Falk et al. (2011, p. 143) for details. 

This essentially means that numbering the rescaled observations that fall in a critical region, 
e.g. see the shaded sets in the left panels of Figure where at least one coordinate is large, makes 
it possible for to be computed using the void probability of N, that is 


Go(r/)=pr[iV{(0,l] xi3^} = 0] 

= exp(-[C{(0,1]} X r]{By)]) (5) 

= exp{-l/(y)} y > 0. 

From Figure we see that in the case of strong dependence (top-left panel) all the coordinates of the 
extremes are large, while in the case of weak dependence (bottom-left panels) only one coordinate 
of the extremes is large. 

At this time it remains to be specify the structure of the exponent measure. This task is simpler 
to fulfil when working with pseudo-polar coordinates. 

3) With unit Frechet margins, the stability property ([^ can be rephrased by GQ{ay) = Go{y) 
for any a > 0, implying that rj satisfies the homogeneity property 


r]{aBy) = y{By)/a, ( 6 ) 

for all By C M, where By := M\(0,y] with y > 0. Note that for a Borel set C M we have 
aB = {av : u G B} and Bay = aBy. Now, let 


W := (u G M : ui -k ... -|- Urf = 1), 


be the unit simplex on M (simplex for brevity), where d — 1 variables are free to vary and one is 
fixed, e.g. = 1 — (ui -|- • • • -k Vd-i)- For any v G M+, with the sum-norm, ||u|| = |ui| -k • • • -k |ud|, 
we measure the distance of v from 0. Other norms can also be considered (e.g. Resnick 2007[ 
pp. 270-274). We consider the one-to-one transformation Q : M —>• (0,oo) x W, given by 


(r,m) := Q(u) = (||u||, ||u|| ^u), uG 


By means of this, the induced measure \s 'll} ■.= y * Q, i.e. '0(Wr) = f?{<3'*~(Wr)} for all sets 
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Wr = r X W with r > 0 and W C W, is generated. Then, from the property Q it follows that 


ipiWr) = ri{{v G M : ||t;|| > r,i)/||r;|| G W)} 

= r]{{ru G M : ||it|| > l,w/||u|| G W)} 


where H'(W) := rj{(u G M : ||it|| > l,it/||ii|| G W)}. The benefit of transforming the coordinates 
into pseudo-polar is that the measure r] becomes a product of two independent measures: the radial 
measure (l/r) and spectral measure or angular measure {H') (e.g. Falk et ah, 2011, p. 145). The 


first measures the intensity (or distance) of the points from the origin and the second measures 
the angular spread (or direction) of the points. This result is known as the speetral decomposition 


(de Haan and Resnick, 1977). Hereafter we will use the term angular measure. 

The density of V’ is dV’(r, w) = r“^dr x dH'{w) for all r > 0 and w G W, by means of which 
we obtain the explicit form 


r]{By) = ^p{Q{v G M : > yi or... or Ud > Vd)} 

= il>[{{r, w) G (0, oo) X W : r > mm{yj/wj,j G /)}] 
r~^drdH' (w) 

= / max {wj/Hj) dH'(w). 

Jw 


(7) 


In pseudo-polar coordinates, extremes are the values whose radial component is higher than a 
high threshold, see the red points in the middle panels of Figure The angular components are 
concentrated around the center of the simplex, in the case of strong dependence (middle-top panel), 
while they are concentrated around the vertices of the simplex (middle-bottom panel), in the case 
of weak dependence. 

The measure H' can be any finite measure on W satisfying the first moment conditions 


Wj dH'{w) = 1, y j G I. 


This guarantees that the marginal distributions of Go are unit Frechet. If H' satisfies the first 
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Figure 1: Examples of critical regions in (left-panels) and its representation in pseudo-polar co¬ 
ordinates (middle-panels). Red points are the extremes with strong (top-panels) and weak (bottom- 
panels) dependence. Right panels display the angular densities on the simplex. 


moment conditions, then the total mass is equal to 


H'{W) = f {wi Wd)dH'(w) = f WjdH'{w) = d 

Jw Jw 


So setting H := H'/H'(W), then H is a probability measure satisfying 


WjdH{w) = l/d, y j e I. 


( 8 ) 


Concluding, combining Q, Q, Q and Q all together, we have that a MEVD with unit Erechet 
margins is equal to 

Go{y) = exp -d / max{wj/yj) dH{w) I . (9) 

I Jw ie-f J 

2.2 Angular densities 

The measure H can place mass on the interior as well as on other subspaces of the simplex, such 
as the edges and the vertices. Thus H can have several densities that lie on these sets, which are 


named angular densities. Coles and Tawn (1991) described a way to derive the angular densities 










































when G is absolutely continuous (see also Resnick, 2007, Example. 5.13). 


Specifically, let S := P(/)\0, where P(/) is the power set of I and S be the index set that takes 
values in S. Given fixed d, the sets 

^d,S = {w ^ W : Wj = 0, if j ^ S] Wj > 0 if j G S), 

for all 5 G S provide a partition of W in 2*^ — 1 subsets. Similar to the simplex, there are k — 1 
variables Wj in Wd,s that are free to vary, where j G S and A: = |5| denotes the size of S. We denote 
by 5 the density that lies on the subspace Wd,s, where 5 G S. When the latter is a vertex ej of 
the simplex W, for any j G /, then the density is a point mass, that is hd,s = H{{ej}). 

Let S = {ii,... ,ik} C I, when Go is absolutely continuous the angular density for any y G 
is 

Vh Vik-i ^ (^ \ .. d'^V 


hd,s 


Ylies yi 


Y.yi 

\i&S / 


dyi^ • • • dyi^ 

as 


{y)- 


( 10 ) 


Two examples of a tridimensional angular density in the interior of the simplex are reported in the 
right panels of Figure These are the densities of a symmetric logistic model (Gumbel, 1960) with 
a strong and weak dependence. When S = {i} for any i G I the angular density hd^s represents the 


mass of H at the vertex e,- with j = i, thus (10) reduces into 


hd,s = H{{ei}) = -yf^ lim ^{v)- 


( 11 ) 


In the bivariate case these results are equal to the ones obtained by Pickands (1981). Kotz and 


Nadarajah (2000) discussed the bivariate case in the following terms. With d = 2 the unit simplex 


W = [0,1] can be partitioned into 


W2,{1} = {(1,0)}, W2,{2} = {(0, 1)1, W2,{1,2} = {(w^, 1 - w'),w^ G (0,1)}. 


The densities that he on them are 


.2., dV 


h2,{i} = ^({0}) = -2/1 hm ^(2/1,2/2), 

BV 

^2,(2} = ^({1}) = -2/i 


yi- 
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and 


-—^(u;,l -u;). 
oyi dy 2 


^2,{i,2}(w^) = - 

respectively, for any yi,y 2 > 0. The first two densities describe the case when extremes are only 
observed in one variable. While the third density describes the case when extremes are observed in 
both variables. 


2.3 Extremal dependence 

From ([^ it emerges that the extremal dependence is expressed through the exponent fnnction. This 
is a map from to (0, oo) satisfying the properties: 

1. is a continuous function and homogeneons of order —1, the latter meaning that V{ay) = 
a~^V{y) for all o > 0; 

2. is a convex function, that is V{ay + (1 — a)y') < aV{y) + (1 — a)V{y'), for a G [0,1] and 

y,y' 


3. max(l/yi,..., 1/yd) < E(y) < {1/yi + ... + l/yd), with the lower and upper limits repre¬ 
senting the complete dependence and independence cases respectively. 


See de Haan and Ferreira (2006, pp. 223-226) for details. In summary, let E be a random vector 
with distribution ([^. When H places the total mass 1 on the center of the simplex {1/d,... ,1/d), 
then Yi = Y 2 = ■■■= Yd almost surely and hence Go{y) = expjmax (1/yi,..., 1/yd)}. When H 
places mass l/d on for all j G I, i.e. the vertices of the simplex, then Yi,... ,Yd are independent 
and hence Go{y) = exp(l/7/i -|- ... + 1/yd)- This rephrased for a random vector X with distribution 
0 becomes 


min{G'i(xi),..., Gdixd)} < G{x) < G'i(xi) • ... • Gdixd), x G W^. 


In order to visualise the exponent function more easily, its restriction in the simplex is nsually 


considered. This is a function ^ : W —)• [l/d, 1], named the Pickands dependence function (Pickands 


1981), defined by 


A{t) := d / msiK{wjtj)dH{w), 

Jw ie/ 


where Zj = l/yj, j & I, tj = Zj/{zi + ■ ■ ■ + Zd) with j = 1,... ,d — 1 and = 1 — (fi + • • • + 
td-i). A inherits the above properties from V with the obvious modifications. In particular, 1/d < 
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max(ti ,... ,td) < A{t) < 1, where lower and upper bounds represent the complete dependence and 
independence cases, and for the homogeneity property of A the exponent function can be rewritten 
as 

V{z) = {zi-\ - \-Zd)A{ti,...,td), zGM+. 

The exponential function can be profitably used in several ways. First, an important summary 
of the extremal dependence is given by 


'd = V{1,... ,1) = d / max{wj)dH{w). 

Jw is/ 


( 12 ) 


This is named the extremal coefficient (Smith, 1990) and it represents the (fractional) number of 


independent components of the random vector Y. The coefficient takes values in [l,(i], depending 
on whether the measure H concentrates near the center or the vertices of the simplex. The bounds 
regard the cases of complete dependence and independence. 

Second, for any y > 0 and failure region 


Ty = {v G R : vi > Hi and... and Vd > Ud), 


(13) 


the tail dependence function (Nikoloulopoulos et ah, 2009; de Haan and Ferreira, 2006 p. 225) is 
defined by 

Riy) ■■= yiiv G R : vi > yi and... and Vd > yd)} = y > 0. 


This counts the number of observations that fall in the failure region, i.e. all their coordinates 
are simultaneonsly large. The tail dependence function is related to the exponent function by the 
inclnsion-exclnsion principle. Using similar argnments to those in ([^ and ([^ it follows that 


R{y) = d / mm{wj/yj)dH{w) y > 0. (14) 

Jw 

By means of the tail dependence fnnction, another important summary of the dependence between 
the components of Y is obtained. The coefficient of upper tail dependence is given by 


X = R{1, ...,l) = d 


mm{wj)dH (w) 
' 16 / 


(15) 
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It measures the strength of dependence in the tail of the distribution of Y or in other terms the 
probability that all the components of Y are simultaneously large. This coefficient was introduced 


in the bivariate case by Joe (1997, Ch. 2) and extended to the multivariate case by Li (2009). When 
H concentrates near the center or on the vertices of the simplex, then x > 0 or x = 0 respectively. 
In these cases we say that Y is upper tail dependent or independent. 

In addition, the exponent and the tail dependence functions can be used for approximating the 
probability that certain types of extreme events will occur. Specifically, let be a random vector 
with unit Pareto margins. F is in the domain of attraction of a MEVD with Frechet margins. From 
(Q and for the homogeneity property of V we have that {1 — F(ny)} ~ V{ny) for large n. Then, 
for the relations 0 and ([^, the approximating result follows 


pr(yi > yi or ... or Y^ > Vd) ^ d / max (wj/yj) dH{w), 

Jw fe/ 


(16) 


when yi,... ,yd are high enough thresholds. Furthermore, with similar arguments to those in Section 
o we have that 

lim nF{nyi,..., nyd) = R{y), 


where F is the survivor function of Y. R has the same homogeneity property of V. Hence, F(ny) 


R{ny) for large n. Then, for the relation (14), the approximating result also follows 


pr(Ti > yi and... and Yd > yd) ^ d / min (wj/yj) dH{w), 

Jw fe/ 


(17) 


when yi,... ,yd are high enough thresholds. 

Lastly, when x = 0 the elements of Y are independent in the limit. However, they may still 


be dependent for large but finite samples. Ledford and Tawn (1996) proposed another dependence 


measure in order to capture this feature. For brevity, we focus on the bivariate case. Suppose that 
F for y —7- oo satisfies the condition 


F{y,y) - y o < r < i, 


where £ is a slowly function, i.e. C{ay)/C{y) — )■ 1 as y —)■ oo for any a > 0. Then for large y, 
assuming C constant, different tail behaviours are covered. The case x > 0 is reached when r = 1 
and so the variables are asymptotically dependent. When 1/2 < r < 1 this means that x = 0 ^md 
so the variables are asymptotically independent, but they are still positively associated and the 
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value of r expresses the degree (see Ledford and Tawn, 1996| for details). 


3 Parametric models for the extremal dependence 

From the previous sections, it emerges that both the exponent and tail dependence functions 
depend on the angular measure. There is no unique angular measure that generates the extremal 
dependence, any finite measure that satisfies the first moment conditions is suitable. In order to 
represent the extremal dependence, in principle it is insufficient to use a parametric family of 
models for the distribution function of the angular measure. However, flexible classes of parametric 


models can still be useful for applications, e.g. see Tawn (1990), Coles and Tawn (1991) and Boldi 


and Davison (2007) to name a few. To this end, in previous years different parametric extremal 


dependence models have been introduced in the literature. A fairly comprehensive overview can be 
found in Kotz and Nadarajah (2000, Section 3.4), Coles (2001, Section 8.2.1), Beirlant et al. ( 2006| 


Section 9.2.2) and Padoan (2013b). In the next sections we describe some of the most popular 
models. 


3.1 Asymmetric logistic model 


The multivariate asymmetric logistic model is an extension of the symmetric, introduced by Tawn 
(1990) (see also Coles and Tawn| 1991) for modelling extremes in complex environmental applica¬ 
tions. 


Let S and S as in Section 2.2 and Ng be a Poisson random variable with rate l/r^. This describes 


the number of storm events, ns, that takes place on the sites 5 in a time interval. Given ns, for 
any site j £ S, let {Xj^s^i^i = be a sequence of i.i.d. random variables that describe 

an environmental episode such as rain. For a hxed i, {Xj^s-,i}jeS is assumed to be a dependent 
sequence. The maximum amount of rain observed at j is Xj^s = Let be 

a random effect with a positive stable distribution and stability parameter > 1 ( |Nolan 2003), 
representing an unrecorded additional piece of information on storm events. Assume 
as an independent sequence. Define Yj = max^gg^-lAj^^}, where Sj C S contains all nonempty 
sets including j and so the maximum is over all the storm events involving j. Then, the exponent 
function of the joint survival function of (Yi,...,!^), after transforming the margins into unit 
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exponential variables, is 


cSeS jes 


r i/«s , 

"•s >> , y e 


where 6 = {as, Pj,s}se§, as > I, /3s = rs/Yls&Sj'^s and /3j^s = 0 if j ^ 5, and for j E I, 
0 < /3j, 5 < 1 and Yls&sl^j,s ~ parameter /3j^s represents the probability that the maximum 

value observed at j is attributed to a storm event involving the sites of S. The number of the model 
parameters is 2^~^{d + 2) — (2d + 1). 


In this case the angular measure places mass on all the subspaces of the simplex. From (10) it 
follows that the angular density is, for every 5 E S and all w E s equal to 


k-l 


hd,s{w; 6) = ]^(ia5 - 1) H 


as,,-(as+l) / k 


S^j 


i=l 


j&s 






j&s 


When S = I, as = a, f3j^s = f3j and so the angular density on the interior of the simplex simplifies 
to ^ ^ 

h(w-e) = Yl{ia-l)YlpfwJ^^^^^'^(Pj/wj)^y , w£W. 


i=l 


"J ""j \ 

16/ j&I 


When S = {j}, for all j E I, then from (11) it follows that the point mass at each extreme point 
of the simplex is hd^s = (3j,s- 

For example in the bivariate case, the conditions on the parameters are + /3i,{i,2} = 1 and 
/32,{2} + /52 ,{i,2} = 1, so the masses at the corners of S 2 = [0,1] are given by h 2 ,{i}. = 1 — /3i and 
^ 2 ,( 2 } = 1 ~ /32, where for simplicity /3i,{i,2} = I3i and /? 2 ,{i, 2 } = h, while the density in the interior 
of the simplex, for 0 < u) < 1, is 


/^2,{i,2}H = («- imi32r{w{i - w)r-\pr(i - w)r + 

The top row of Figure illustrates some examples of trivariate angular densities for dif¬ 
ferent values of the parameters 0 = (a,/3i,/? 2 ,/^s), where the subscript of the index set 
S = {1,2,3} has been omitted for simplicity. The values of the parameters are, from left to 
right {(5.75,0.5, 0.5,0.5); (1.01, 0.9, 0.9,0.9); (1.25, 0.5,0.5,0.5); (1.4,0.7, 0.15, 0.15)}. The first panel 
shows that with large values of a and equal values of the other parameters, the case of strong de¬ 
pendence among the variables is obtained. The mass is mainly concentrated towards the center of 
the simplex. The second panel shows that when a is close to 1 and the other parameters are equal. 
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Figure 2: Examples of trivariate angular densities for the Asymmetric Logistic, Tilted Dirichlet, 
Pairwise Beta, Hiisler-Reiss and Extremal-t models from top to bottom. 
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the case of weak dependence is obtained. The mass is concentrated on the vertices of the simplex. 
The third panel shows the case of a symmetric dependence structure with the mass near the corners 
of the simplex but not along the edges. Finally, the fourth panel shows a case of an asymmetric 
dependence structure where the mass tends to be closer to the components whose corresponding 
values of (5 are high. 


3.2 Tilted Dirichlet model 

Extremal dependence models with an angular measure that places mass on the interior, vertices 
and edges of the simplex are more flexible than those with a measure that concentrates only on 
the interior. An example is the asymmetric logistic model versus the symmetric. However, the 
former has too many parameters to estimate, so parsimonious models may be preferred. In order to 
derive a parametric model for the angular density whose mass concentrates on the interior of the 


simplex. Coles and Tawn (1991) proposed the following method. Consider a continuous function 


h' : W —>■ [0, oo) such that rrij = Jg Vj h\v)dv < oo for all j G I. Then, the function 


h{w) = d + • • • + mdWd) h'{mw/{miwi + • • • + mdWd)},w G W 


is a valid angular density. It satisfies the first moment conditions (|^ and its mass is centered at 
{1/d,... ,1/d) and integrates to one. For example, if h' is the density of the Dirichlet distribution, 
then we obtain the angular density 


h{w, 6) 


diEj&i ajWj)<i+^ F(aj) \Ejei ^jWj 


aj-l 

,w gW, 


(18) 


where 6 = {aj > 0}jg/. This density is asymmetric and it becomes symmetric when ai = ■ ■ ■ = ad. 
Extremes are independent or completely dependent when for all j G I the limiting cases aj 0 


be interpreted as the asymmetry and intensity of the dependence between pairs of variables. 

In this case, the exponent function can not be analytically computed, nonetheless it can still be 
evaluated numerically. 

The second row of Figure illustrates some examples of trivariate angular densities obtained 
with different sets of the parameters 0 = {ai, 02 , as)- The plots from left to right have been obtained 


and aj —)> 00 arise. The dependence parameters aj, j G I, are not easy to interpret. However, Coles 


and Tawn (1994) draw attention to the quantities ri = (a^ — a j)/2 and r 2 = (a* + aj)/2 which can 
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using the parameter sets {(2, 2,2); (0.5, 0.5, 0.5); (2, 2.5, 30); (0.1,0.25,0.95)}. The first panel shows 
that, when values of the parameters are equal and greater than 1, the mass concentrates in the center 
of the simplex leading to strong dependence. The second panel shows the opposite, when values of 
a are equal and less than 1, it yields to the case of weak dependence as the mass concentrates on 
the vertices of the simplex. The third panel shows the case of an asymmetric dependence structure 
and this is obtained when the values of the parameters are all greater than one. In this specific 
case the mass tends to spread towards the bottom and top left edges. The fourth panel illustrates 
another case of an asymmetric dependence structure, in this case obtained with all the values of 
the parameters that are less than 1, leading to a mass that concentrates along the top right edge 
and vertices. 


3.3 Pairwise beta model 


The tilted Dirichlet model has been successfully used for applications (e.g. Coles and Tawn, 1991), 


although it suffers from a lack of interpretability of the parameters. Cooley et al. (2010) proposed 


a similar model but with easily interpretable parameters. The definition of their model is based on 
a geometric approach. Specifically, they considered the symmetric pairwise beta function 


h*{wi,Wj) = 


Wi 


r^(/3i,i) + wj 


Wi 


Wi + Wj 


Pi, 3 1 


,hj e I, 


where Wi and Wj are two elements of w and j3ij > 0. This function has its center at the point 
(1/d,..., 1/d) and it verifies the first moment conditions Then, the angular pairwise beta 
density is defined by summing together all the d(d — l)/2 possible pairs of variables, namely 


h{w; 0) 


2(d-3)!r(ad+l) 
d(d-l)r( 2 a + l)r{a(d- 2 )} 


^ h{wi,Wj), 


w G W, 


where 


h{wi,Wj) = {wi + Wj)"^"^ ^{1 - {wi + Wj)}°‘^'^ h*{wi,Wj) 


and 6 = {a, {I3ij}ij^i) with a > 0. Each parameter /3ij controls the level of dependence between the 
and the components and the dependence increases for increasing values of jSij. The function 
h* is introduced to guarantee that the dependence ranges between weak and strong dependence. The 
parameter a controls the dependence of all the variables, when it increases the overall dependence 
increases. 
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Also in this case the exponent function can not be computed in closed form and hence it can 
only be evaluated numerically. 

The third row of Figure provides some examples of trivariate angular densities obtained 
with different values of the parameters 0 = (a,/3i^2;/3i,3,/32,3)- The plots from left to right have 
been obtained using the parameter sets {(4, 2, 2, 2); (0.5,1,1,1); (1, 2,4,15); (1,10,10,10)}. The first 
panel shows a case of symmetric density obtained with all equal parameters /3ij i,j € I. A large 
value of the overall dependence parameter a pulls the mass towards the center of the simplex, 
indicating a strong dependence between the variables. On the contrary, the second panel shows that 
when the overall dependence parameter is close to zero then the mass concentrates on the vertices 
of the simplex, indicating weak dependence among the variables. The third panel illustrates a case 
of asymmetric angular density with strong dependence between the second and third variables that 
is due to a large value of 132,3■ Although the value of the global dependence parameter a is not 
large, it is enough to slightly push the mass towards the center of the simplex. The fourth panel 
shows a case of symmetric angular density, which is obtained with large values of the pairwise 
dependence parameters and an average value of the global dependence parameter. The mass is 
mainly concentrated on the center of the simplex and some mass tends to lie near the centers of 
the edges. 


3.4 Hiisler-Reiss model 


One of the most popular models is the Hiisler-Reiss (Hiisler and Reiss, 1989). Let Xi ,be 


n i.i.d. copies of a zero-mean unit variance Gaussian random vector. Assume that for all i,j € I 
the pairwise correlation Pij-n satisfies the condition 


lim logre(l 

n^oo 




AT G [0,oo). 


Then, the exponent function of the limit distribution of bn{Mn — bn) for n —)> oo, where bn = 
{bn ,..., bn) is a vector of real sequences (see Resnick, 2007[ pp. 71-72), is 


viy.o) = 

1=1 


+ 


logyi/yj 


2A 






y e 




(19) 


where 0 = {Ajjjjjg/, Ij ■= I \ {yj, 4>rf_i is d — 1 dimensional Gaussian distribution with partial 
correlation Aj. For all j G I, the elements of Aj are Xk,r,j = {Xlj + A^ - Xl -)/{2Xk,jXij), for 


18 













G Ij. The parameter Xij, i,j G I, controls the dependence between the and elements of 
a vector of d extremes. These are completely dependent when Xij = 0 and become independent as 


yij 


oo. 


In this case the angular measure concentrates on the interior of the simplex. Applying (10) it 


can be checked (Engelke et al. 2015) that the angular density is 


h{w]e) = cj)d-i 



log Wj/Wl 
2Ai4 





i=2 


-1 

, It) G W, 


where 4>d-i is d — 1 dimensional Gaussian density with partial correlation matrix Ai. 

The second last row of Figure [^provides some examples of trivariate angular densities obtained 
with different values of the parameters 6 = (Ai ^25 Ai^s, A 2 , 3 ). The plots from left to right have been 
obtained using the parameter sets {(0.3,0.3, 0.3), (1.4,1.4,1.4), (1.7, 0.7,1.1), (0.52, 0.71, 0.52)}. The 
first panel shows that with small and equal values of parameters the case of strong dependence 
among all the variables is obtained. In this case the mass concentrates around the center of the 
simplex. On the contrary, the second panel shows that with large and equal values of the parameters 
the case of weak dependence is obtained. In this case the mass is placed close to the vertices of 
the simplex. The third panel shows that an asymmetric dependence structure is obtained when the 
parameter values are different. In this case the mass tends to concentrate around the vertices and 
edges that are concerned with the smaller values of the parameters. The fourth panel shows that 
a symmetric dependence structure, with respect to the second component is obtained setting the 
values of two parameters to be equal. In this case the mass is equally divided up towards the two 
vertices and edges that are concerned with the smaller values of the parameters. 


3.5 Extremal-t model 


The extremal-f model (Nikoloulopoulos et al., 2009) is more flexible than the Hiisler-Reiss but it 


is still simple enough. It is easily interpretable and useful in practical applications (see Davison 


et al., 2012). Let Xi ,..., Xn be n i.i.d. copies of a zero-center unit scale Student-t random vector 


with dispersion matrix S and 12 > 0 degrees of freedom (d.f.). Then, the exponent function of the 
limiting distribution of Mn/dn for n —)• 00 , where = (a^ ..., Un) is a vector of positive sequences 
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(see Demarta and McNeil, 2005), is 


V{y;e) = ^-Td-i,,+v 

3=1 


u + 1 


I- p, 




Pi,3 


-I i&I 



( 20 ) 


for all y G where 6 = ({pjjjjjg/, v) and Td-i^y+\ is a d — 1 dimensional Student-f distribution 
with V + 1 d.f. and partial correlation matrix Sj. The correlation parameter pij, i,j G I, drives 
the dependence between pairs of variables with the dependence that increases with the increasing 
of pij. The parameter i3 controls the overall dependence among all the variables. For decreasing 
values of u the dependence increases and vice versa. 

The Hiisler-Reiss model is a special case of the extremal-t. Indeed, for all i,j G / if the correlation 
parameters of the extremal-t distribution are equal to Pij;y = 1 — Xfjiv, then this distribution 


converges weakly, as —)■ oo, to the Hiisler-Reiss (see Nikoloulopoulos et ah, 2009). 

In this case the angular measure places mass on all the subspaces of the simplex. When S = I, 


then applying (10) we obtain that the angular density is 


h{w, 6) 



w G W, 


where is d—1 dimensional Student-t density with partial correlation matrix Si (e.g. Ribatet 


2013). When S = {j}, then applying © we obtain that the mass on the extreme points of the 
simplex is 


^d,s — Tr 






j e I- 


The last row of Figure provides some examples of the trivariate angular densities obtained 
with different values of the parameters 6 = {pip, pi, 3 , P 2 , 3 , i^)- From left to right the plots are 
obtained using the parameter values {(0.95, 0.95,0.95,2); (—0.3,—0.3,—0.3, 5); (0.52,0.71,0.52, 3); 
(0.52,0.71,0.52,2)}. The first panel shows that when the scale parameters pij are all equal and 
close to one and the d.f. v are small, then the mass concentrates around the center of the simplex 
and therefore the dependence is strong. The second panel shows the opposite, when the correlations 
are close to zero and the d.f. are high, the mass concentrates around the vertices of the simplex 
and hence the dependence is weak. The third panel shows that when two scale parameters are 
equal then the dependence structure is symmetric with respect to the second component and the 
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mass tends to concentrate on the top vertex and the bottom edge and vertices. The fourth panel 
shows that with the same setting but with smaller d.f. the mass is pushed towards the center of 
the simplex and hence the dependence is stronger. 


4 Estimating the extremal dependence 


Several inferential methods have been explored for inferring the extremal dependence. Nonpara- 
metric and parametric approaches are available. In the first case recent advances are [Gudendorf 


and Segers (2011), Gudendorf and Segers (2012) and Marcon et al. (2014), see also the references 


therein. Both likelihood based and Bayesian inferential methods have been widely investigated. 


Examples of likelihood based methods are the approximate likelihood (e.g. Goles and Tawn, 1994 


Cooley et ah, 2010 Engelke et al., 2015) and the composite likelihood (e.g. Padoan et al., 2010 


Davison and Gholamrezaee, 2012). Examples of Bayesian techniques are Apputhurai and Stephen¬ 


son 


(2011), Sabourin et al. (2013), Sabourin and Naveau (2014). 


For comparison purposes in the next section the real data analysis is performed using the 
maximum approximate likelihood estimation method and the approximate Bayesian method based 
on the approximate likelihood. Here is a brief description. 

From the theory in Sections ! 


2.1 


if 1^1,..., Yn are i.i.d. copies of Y on Ml with a distribution in 


the domain of attraction of a MEVD, then the distribution of the sequence {Ri/n, VEj, i = 1,... ,n}, 
where Ri = Fjp Yi^d and Wi = YijRi^ converges as n —>• oo to the distribution of a PPP 

with density dV'(r, m) = r“^dr x dH{w). 

Assume that xi,... ,Xn are i.i.d. observations from a random vector with an unknown distribu¬ 
tion. Since the aim is estimating the extremal dependence, we transform the data into the sample 
Ui,... with unit Frechet marginal distributions. This is done by applying the probability in¬ 
tegral transform, after fitting the marginal distributions. Next, the coordinates of the data-points 
are changed from Euclidean into pseudo-polar by the transformation 


ri = yi,i-\ - \-yi,d Wi = yjri, i = 

Then, the sequence {{ri,Wi),i = l,...,n : ri > tq}, where tq > 0 is a large threshold, comes 
approximately from a Poisson point process with intensity measure if:. Let W^o = {(r, m) : r > ro} 
be the set of points with a radial component larger than rg, then the number of points falling in Wro 
is given by N{Wro) ~ Pois{l/'i/)(Wro)}. Conditionally to N{Wro) ~ points i = 
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1,... ,m} are i.i.d. with common density d^p{r,w)/'ll;(yVrQ)■ If we assume that H is known apart 
from a vector of unknown parameters 6 € Q gMP, then the approximate likelihood of the excess is 


L{e;{r^i),w^i)),i = = - — - [[ 


ml 




oc 




( 21 ) 


i=l 


where /i is a parametric angular density function (e.g. Engelke et ah, 2015; Beirlant et ah, 2006 


pp. 170-171). In the next section the angular density models described in Section]^ are fitted to 


the data by the maximization of the likelihood (21). For brevity the asymmetric logistic model is 


not considered since it has too many parameters. The likelihood (21) is proportional to the product 


of angular densities, therefore the maximizer of (21) is obtained equivalently by maximizing the 
log-likelihood 

m 

= ( 22 ) 


2=1 


Denote by 6 the maximizer of ^ and by {6) = Vgi{0) the score function. Since (21) provides an 


approximation of the true likelihood, then from the theory on model misspecification (e.g. Davison 


2003, pp. 147-148) it follows that 


^/^(e -6) A AAp(0, J(0)-^ K{e) J(6»)-i), n ^ oo. 


where is the p-dimensional normal distribution with mean fj, and covariance S, 6 is the 

true parameter and 

J( 0 ) = -E{V 0 £'( 0 )}, K{e) = Var 0 {/( 0 )}, 


are the sensitive and variability matrices (Varin et ah, 2011). In the case of misspecified models, 


model selection can be performed by computing the Takeuchi Information Criterion (TIC) (e.g. 


Sakamoto et ah, 1986), that is 


TIC 


-2 


^(61) -tr{E:(6l)J-^(6>)} 


where the log-likelihood, the variability and sensitive matrices are evaluated at 6. The model with 
the smallest value of the TIC is preferred. 

In order to derive an approximate posterior distribution for the parameters of an angular density, 
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the approximate likelihood (21) can be used within the Bayesian paradigm (see Sabourin et al. 


2013). Briefly, let q{6) be a prior distribution on 0, then the posterior distribution of the angular 

Y\T=ih{w(i),0)q{e) 


density’s parameters is 


q{e\w) = 


(23) 


!eWT=iKw{{), 0 ) q{e) de' 

With the angular density models in Section ([^ the analytical expression of q{6\w) can not be 
derived. Therefore, we use a Markov Chain Monte Carlo method for sampling from an approxi¬ 


mation of q{0\w). Specifically, we use a Metropolis-Bastings simulating algorithm (e.g. Hastings 


1970). With the pairwise beta models we use the prior distributions described by Sabourin et al. 


(2013). With the titled Dirichlet and Hiisler-Reiss model we use independent zero-mean normal 
prior distributions with standard deviations equal to 3 for log aj and log Ajj with i,j £ I. For the 
extremal-t model we use independent zero-mean normal prior distributions with standard devia¬ 
tions equal to 3 for sign(/?jj)logit(p^^) with i,j G I, where sign(x) is the sign of x for x G M and 
logit(x) = log(x/(l — x)) for 0 < X < 1, and a zero-mean normal prior distribution with standard 


deviations equal to 3 for logz^. Similar to Sabourin et al. (2013), for each models’ parameter we 
select a sample of 50 x 10^ observations from the approximate posterior, after a burn-in period 
of length 30 x 10^. These sizes have been determined using the Geweke convergence diagnostics 
(jCeweke , 1992) and the Heidelberger and Welch test (fHeidelberger and Welch[ 1981) respectively. 


Model selection is performed using the Bayesian Information Criterion (BIC) (e.g. Sakamoto 


et al., 1986), that is 


BIC = —2i{6) -|-p{logm -|- log(27r)}, 

where p is the number of parameters and m is the sample size. The model with the smallest value 
of the BIC is preferred. 

5 Real data analysis: Air quality data 


We analyze the extremal dependence of the air quality data, recorded in the city centre of Leeds, 
UK. The aim is to estimate the probability that multiple pollutants will be simultaneously high in 


the near future. This dataset has been previously studied by Heffernan and Tawn (2004), Boldi and 


Davison (2007) and Cooley et al. (2010). The data are the daily maximum of five air pollutants: 


particulate matter (PMIO), nitrogen oxide (NO), nitrogen dioxide (N02), ozone (03), and sulfur 
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dioxide (S02). Levels of the gases are measured in parts per billion, and those of PMIO in micro¬ 
grams per cubic meter. We focus our analysis on the winter season (from November to February) 
from 1994 to 1998. 

A preliminary analysis focuses on the data of triplets of variables. For brevity we only report 
the results of the most dependent triplets: PMIO, NO, S02 (PNS), N02, S02, NO (NSN) and 
PMIO, NO, N02 (PNN). For each variable, the empirical distribution function is estimated with 


the data below the 0.7 quantile and a GPD is fitted to the data above the quantile (Cooley et al. 


2010). Then, each marginal distribution is transformed into a unit Frechet. The coordinates of the 


data-points are transformed to radial distances and angular components. For each triplet, the 100 
observations with the largest radial distances are retained. The angular density models in Section 
are fitted to the data using the methods in Section 
The results are presented in Table Maximum likelihood estimates are similar to the estimated 
posterior means and the estimated posterior standard deviations are typically larger than the 


standard errors. For PNS we obtain the same maximum likelihood estimates as Cooley et al. (2010) 
with the pairwise beta model, however we use (Q to compute the variances of the estimates and 
so we attain larger standard errors than they do. Both the TIC and BIC lead to the same model 
selection. The Hiisler-Reiss model provides the best fit for all the groups of pollutants. 

From top to bottom. Figure displays the angular densities, computed with the posterior 
means. From left to right the Hiisler-Reiss, the tilted Dirichlet and the pairwise beta densities are 
reported. With PNS, we see that there are many observations along the edge that link PMIO and 
NO, revealing strong dependence between these two pollutants. There are also several observations 
on the S02 vertex, reflecting that this pollutant is mildly dependent with the other two. There 
are also some data in the middle of the simplex, indicating that there is mild dependence among 
the pollutants. Similarly, with NSN we see that there is strong dependence between NO and N02, 
because there are many observations along the edge that link them. There is a mild dependence 
between S02 and the other pollutants, because there is a considerable amount of data on the 03 
vertex. Overall, there is mild dependence among the pollutants, because there is a small amount of 
data in the middle of the simplex. With PNN we see that most of the observations are placed on 
the middle of the simplex revealing an overall strong dependence among the pollutants. There is a 
small amount of data along the edge that link N02 and NO and on the PMIO vertex. This reflects 
more dependence between N02 and NO than between N02 and PMIO and PMIO and NO. All 
these features are well captured by the angular densities estimated using the Hiisler-Reiss model. 
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Model Method Estimates i{0) TIC/BIC 


TD 


Oil 

0-2 

as 



PNS 

L 

1.20(0.24) 

0.67(0.07) 

0.41(0.08) 

199.63 - 

-399.21 


B 

1.22(0.25) 

0.68(0.11) 

0.42(0.09) 


-379.90 

NSN 

L 

0.85(0.12) 

0.39(0.08) 

0.90(0.11) 

200.84 - 

-401.63 


B 

0.86(0.15) 

0.39(0.09) 

0.81(0.15) 


-382.32 

PNN 

L 

1.43(0.28) 

1.55(0.31) 

1.28(0.20) 

186.35 - 

-372.64 


B 

1.45(0.30) 

1.57(0.28) 

1.29(0.23) 


-353.36 

PB 


Pi,2 

^1,3 

02,3 

a 


PNS 

L 

3.21(0.70) 

0.47(0.05) 

0.45(0.04) 

0.68(0.06) 95.95 - 

-191.87 


B 

3.31(1.13) 

0.48(0.11) 

0.46(0.10) 

0.68(0.09) 

-166.10 

NSN 

L 

0.40(0.03) 

3.74(1.77) 

0.50(0.05) 

0.64(0.05)102.59 - 

-205.13 


B 

0.40(0.09) 

4.00(1.72) 

0.51(0.12) 

0.64(0.08) 

-179.36 

PNN 

L 

3.75(1.38) 

0.71(0.09) 

3.18(1.21) 

1.35(0.18) 84.31 - 

-168.55 


B 

3.83(1.75) 

0.72(0.16) 

3.70(1.80) 

1.37(0.20) 

-142.66 

HR 


Ai,2 

Ai,3 

-^2,3 



PNS 

L 

0.65(0.06) 

0.90(0.04) 

0.98(0.03) 

234.51 - 

-468.93 


B 

0.65(0.04) 

0.90(0.04) 

0.98(0.04) 


-449.67 

NSN 

L 

1.00(0.04) 

0.56(0.04) 

0.96(0.04) 

251.80 - 

-503.54 


B 

1.00(0.04) 

0.57(0.03) 

0.97(0.04) 


-484.25 

PNN 

L 

0.60(0.05) 

0.70(0.04) 

0.51(0.03) 

198.23 - 

-396.38 


B 

0.60(0.03) 

0.70(0.04) 

0.51(0.03) 


-377.11 

ET 


pi,2 

Pi,3 

P2,3 

d 


PNS 

L 

0.87(0.02) 

0.74(0.03) 

0.66(0.03) 

3.89(0.51)152.13 - 

-304.18 


B 

0.87(0.02) 

0.77(0.02) 

0.72(0.01) 

4.02(0.35) 

-275.13 

NSN 

L 

0.58(0.04) 

0.87(0.02) 

0.64(0.03) 

3.50(0.01)141.92 - 

-283.80 


B 

0.72(0.01) 

0.89(0.02) 

0.73(0.02) 

4.00(0.33) 

-242.50 

PNN 

L 

0.88(0.02) 

0.82(0.02) 

0.89(0.01) 

3.70(0.78)180.74 - 

-361.38 


B 

0.86(0.02) 

0.78(0.03) 

0.87(0.02) 

3.21(0.43) 

-330.33 


Table 1: Summary of the extremal dependence models fitted to the UK air pollution data. For 
each angular density model the estimation results of the triplets of pollutants are reported. L and 
B denote the approximate likelihood and Bayesian inferential method. Estimates are maximum 
likelihood (standard errors) and posterior means (standard deviations). 


With this analysis we found that 03 is only weakly dependent with the other pollutants. This 


result was also found by Heffernan and Tawn (2004). Then, the second part of the analysis focuses 
only on PMIO, NO, N02 and S02. Now, because a larger number of parameters needs to be 


estimated, then the 200 observations with the largest radial distances are selected (see Cooley 


et al., 2010). Table presents the estimation results. For brevity we only report the maximum 


value of the log-likelihood, the TIC and the BIC. The Hiisler-Reiss model provides the smallest 
values of the TIC and BIC, revealing again that it better fits the pollution data. Accordingly 
hereafter calculations will be made using this model and the estimates obtained with the Bayesian 
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Figure 3: Estimated angular densities in logarithm scale. Dots represent the largest 100 observations. 



Tilted Dirichlet 

Pairwise Beta 

Hiisler-Reiss 

extremal-t 

m 

654.3 

402.5 

762.7 

532.3 

TIC 

-1308.6 

-805.0 

-1525.3 

-1064.5 

BIG 

-1280.0 

-753.4 

-1475.5 

-974.7 


Table 2: Summary of the extremal dependence models fitted to the UK air pollution data. 


approach. 


We summarize the extremal dependence of the four variables using the extremal coefficient (12) 


and the coefficient of tail dependence (15). Specifically, "d = 2.267 with a 95% credible interval is 


equal to (1.942,2.602) and x = 0.242 with a 95% credible interval is (0.150,0.361). These results 
suggest a strong extremal dependence among the pollutants. The estimated extremal dependence 
can be used in turn to estimate the probability that multiple pollutants exceed a high threshold. 
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Event 1 


Event 2 


Event 3 


Excess / n 18/528 14/562 12/528 

Emp. Est. 0.034 (0.019,0.050) 0.025 (0.012,0.038) 0.023 (0.010,0.035) 

Mod. Est. 0.038 0.030 0.030 


Table 3: Probability estimates of excesses. The first row reports the number of excess and the sample 
size. The second row reports the empirical estimates and between brackets the 95% confidence 
intervals obtained with the normal approximation. The third row reports the model estimates. 


Consider a value y whose radial component is a high threshold tq. Then, the probability of falling in 
the failure region (13) is approximately equal to the right hand side of Because the exponent 


function is related to the tail function by the inclusion-exclusion principle, then using (19) we have 


pr/Ti > yi,... ,Yd> Vd} (^Xkj 

j=i yj 


+ 


^ogVk/Vj 


2A 




kei. 


; Aj}; 


(24) 


where is the survival function of the multivariate normal distribution (Nikoloulopoulos et al 


2009). Similar to [Cooley et al. (2010) we dehne three extreme events: {PMIO > 95, NO > 
270, S02 > 95}, {N02 > 110,802 > 95, NO > 270} and {PMIO > 95, NO > 270, N02 > 


110,802 > 95}. Then, we compute probability (24) using in place of the parameters their esti¬ 
mates. Table [^ reports the results. For the three events the estimates fall inside the 95% confidence 
intervals highlighting the ability of the model to estimate such extreme events. 


The right-hand side of (24) can also be used for estimating joint return levels. In the univariate 
case see 


Coles (2001, pp.49-50). In the multivariate case different definitions of return levels may 


be available (Johansen, 2004). Let J C I, {xi,i G I\J} be a sequence of fixed high thresholds 


and p G (0,1) be a fixed probability. Given a return period 1/p, we define joint return levels the 
quantiles {yj-p,j G J} that satisfy the equation 


p = pr(Yj- > yj.p, Yi > Xi, j £ J,i £ I\J). 

Figure 1^ displays univariate and bivariate joint return level plots. When J = {j}, with j £ I, 
then the joint return level plot displays yj-p against 1/p for different values of p. When J = {i,j}, 
with i,j £ I, then for different values of 1/p the contour levels of {yi-p,yj-p) are displayed. With 
solid lines, the top-left and right panels of Figure [^report the estimated return levels of N02 and 
PMIO jointly to the extreme events {802 > 95, NO > 270} and {NO > 270, N02 > 110, 802 > 95} 
respectively. The dots are the empirical estimates and the red solid lines are the pointwise 95% 
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confidence intervals. These are computed using the normal approximation when p > 0.02 and using 
exact binomial confidence intervals when p < 0.02. The bottom-left and right panels report the 
contour levels of the return levels for (N02, S02) and (PMIO, NO) jointly to events {NO > 270} 
and {N02 > 110, S02 > 95} respectivelly. 

The joint return level can be interpreted as follows. For example, from the top-right panel we 
have that the 50 years joint return level of PMIO is 166. Concluding, we expect that PMIO will 
exceed the level 166 together with the event that NO, S02 and N02 simultaneously exceed the 
levels 270, 95 and 110 respectively, on average every 50 years. 


6 Computational details 


The figures and the estimation results have been obtained using the free software R (Team, 2013) and 


in particular the package ExtremalDep, available at https: //r-f orge. r-project. org/projects/extremaldep/. 
Bayesian estimation is obtained using and extending some routines of the package BMAmev. The left 
and middle panels of Figure were obtain using the routines scatterSd and polygonSd of the 





Figure 4: Joint return level plots of single components N02 and PMIO and of the two components 
(S02, N02) and (NO, PMIO) 
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package plotSD. 
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